Variant Discovery    ◾    143

--filter-name “ReadPostRankSum-20” \

-O filteredVCF/hardfilteredIndels.vcf

4.3  VISUALIZING VARIANTS

The variants in a VCF file can be visualized on a variant viewer such as IGV (Integrated

Genomics Viewer), which is an open-source program for all platforms. It can be down-

loaded from “https://software.broadinstitute.org/software/igv/download” and installed on

a local computer. Figure 4.7 shows the allele fractions and genotypes for each of the InDels

and SNPs of the samples. The dark blue color indicates heterozygous genotype, cyan indi-

cates homozygous genotype, and gray indicates the same genotype as the reference. Refer

to the documentation of the IGV to read more about this.

4.4  VARIANT ANNOTATION AND PRIORITIZATION

The variant calling using any of the variant callers, such as bcftools, FreeBayes, or GATK,

and variant filtering is followed by variant annotation and prioritization. Variant annota-

tion involves adding information and knowledge to high-confidence variants in an effort

to enhance assessment of variants that are likely to impact functions. Following the work-

flow of variant calling, we will obtain high-quality variants in a single VCF file including

the genotypes of all samples. Since variant discovery usually involves the whole genome

or whole exome of an individual or multiple individuals of a species, thousands of vari-

ants may be discovered. However, we are usually interested in the variants that affect the

function or have associations with diseases or other important phenotypes. Variants may

impact functions in different ways depending on the type of the variants. Variants can be

everywhere on the genome sequence, but the most deleterious and damaging are the ones

that have effect on the function of a gene. A variant may suppress, inhibit, or activate a

gene if it affects the gene regulatory region. This kind of effects are usually seen in cancer

cell in which mutations may lead to the hyperactivity of proto-oncogenes, which accelerate

cell growth and division or inactivation of tumor suppressor genes. Variants which affect

the coding regions of a gene may cause an impact depending on the consequence of the

change. A single-nucleotide variant (SNV) that forms a stop codon will cause a truncated

protein that does not function normally. On the other hand, SNV in a stop codon may lead

to a stop loss that results in a longer protein. A variant in a splicing region may also alter

the sequence and function of the protein. More often, SNVs affect the coding regions of a

gene producing new amino acids that change the characteristic and function of the trans-

lated protein. These SNVs are known as missense SNVs and they are the easiest predictable

variants. But an SNV can also be synonymous in the sense that it changes the codon but it

does not change the amino acid. Although this kind of variant does not change the protein

sequence, it may still have a biological consequence. Insertion or deletion of a single or

multiple nucleotides in the coding region may lead to the frameshift, and hence, the pro-

tein will be translated incorrectly from that point.

The most deleterious variants are stop-gain, frameshift, and splicing region variants

since they lead to loss of function. However, before we decide on a variant effect, we should